Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

نویسندگان

  • Andrew W. Moore
  • Christopher G. Atkeson
چکیده

[email protected] NE43-771 MIT AI Lab. 545 Technology Square Cambridge MA 02139 We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of statespace. We compare Prioritized Sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems. It successfully solves large state-space real time problems with which other methods have difficulty. 1 STOCHASTIC PREDICTION The paper introduces a memory-based technique, prioritized 6weeping, which is used both for stochastic prediction and reinforcement learning. A fuller version of this paper is in preparation [Moore and Atkeson, 1992]. Consider the 500 state Markov system depicted in Figure 1. The system has sixteen absorbing states, depicted by white and black circles. The prediction problem is to estimate, for every state, the long-term probability that it will terminate in a white, rather than black, circle. The data available to the learner is a sequence of observed state transitions. Let us consider two existing methods along with prioritized sweeping.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Is prioritized sweeping the better episodic control?

Episodic control has been proposed as a third approach to reinforcement learning, besides model-free and model-based control, by analogy with the three types of human memory. i.e. episodic, procedural and semantic memory. But the theoretical properties of episodic control are not well investigated. Here I show that in deterministic tree Markov decision processes, episodic control is equivalent ...

متن کامل

Generalized Prioritized Sweeping

Prioritized sweeping is a model-based reinforcement learning method that attempts to focus an agent’s limited computational resources to achieve a good estimate of the value of environment states. To choose effectively where to spend a costly planning step, classic prioritized sweeping uses a simple heuristic to focus computation on the states that are likely to have the largest errors. In this...

متن کامل

Planning by Prioritized Sweeping with Small Backups

Efficient planning plays a crucial role in model-based reinforcement learning. Traditionally, the main planning operation is a full backup based on the current estimates of the successor states. Consequently, its computation time is proportional to the number of successor states. In this paper, we introduce a new planning backup that uses only the current value of a single successor state and h...

متن کامل

Prioritized Sweeping Converges to the Optimal Value Function

Prioritized sweeping (PS) and its variants are model-based reinforcement-learning algorithms that have demonstrated superior performance in terms of computational and experience efficiency in practice. This note establishes the first—to the best of our knowledge—formal proof of convergence to the optimal value function when they are used as planning algorithms. We also describe applications of ...

متن کامل

Prioritized Sweeping Reinforcement Learning Based Routing for MANETs

In this paper, prioritized sweeping confidence based dual reinforcement learning based adaptive network routing is investigated. Shortest Path routing is always not suitable for any wireless mobile network as in high traffic conditions, shortest path will always select the shortest path which is in terms of number of hops, between source and destination thus generating more congestion. In prior...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992